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Abstract 

Simple theoretical concepts and models have been helpful to understand the folding 
rates and routes of single-domain proteins. As reviewed in this article, a physical 
principle that appears to underly these models is loop closure. 
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1 Topology and loop closure 

The topic of this review is the relation between the folding kinetics of proteins 
and their three-dimensional, native structures. Central questions concerning 
the folding kinetics are: How do proteins fold into their native structures, and 
what are the rates and routes of folding? Since their discovery in 1991 [1], 
two-state proteins have been in the focus of experimental studies [2-5] . These 
proteins fold from the denatured state to the native state without experi- 
mentally detectable intermediate states. The size of most two-state proteins 
is rather similar, roughly between 60 and 120 residues, with a few smaller 
or larger exceptions [2,4-6]. Nonetheless, their folding rates range over six 
orders of magnitude: the fastest proteins fold on a microsecond [7, 8] and, if 
designed for speed, sub-microsecond time scale [9,10], whereas slow two-state 
proteins fold on a time scale of seconds [11]. In 1998, Plaxco, Simons, and 
Baker [12] discovered that these folding rates correlate with a simple measure 
of the structural 'topology', the relative contact order (CO). The relative CO 
is the average sequence separation of all contacts between amino acids i 

and j in the native structure, divided by the chain length. Proteins with many 
local contacts and, hence, small relative CO, tend to fold faster than proteins 
with many nonlocal, sequence-distant contacts and large relative CO. The dis- 
covery of Plaxco et al. pointed towards a 'surprising simplicity' [13] in protein 



folding kinetics. The folding kinetics problem, i.e. the problem of predicting 
folding rates and routes from native structures, appeared to be considerably 
simpler than the structure problem, the prediction of native structures from 
sequences, which requires detailed atomistic models [14]. 

The physical principle that underlies the correlation between folding rates and 
relative CO seems to be loop closure. Contacts with small CO can be formed 
by closing a small loop, which is fast and requires a small amount of loop- 
closure entropy, compared to closing a large loop [15,16]. It seems plausible 
that protein structures with many local contacts form faster than proteins with 
more complex structures involving many nonlocal contacts, provided that the 
loop-closure entropies, or chain entropies, dominate over sequence-dependent 
interaction energies in the folding process. The strength of the correlation be- 
tween folding rates and relative CO and related structural measures discussed 
in section 3 indicates such a dominance of topological or loop-closure aspects, 
at least for a majority of proteins. Depending on the considered set of two-state 
proteins, the absolute values |r| of the Pearson correlation coefficients between 
folding rates and relative CO of two-state proteins vary between 0.75 and 0.9 
(see Table 2). The squares of these correlation coefficients range roughly from 
0.6 to 0.8, which indicates that between 60 % to 80 % of the observed varia- 
tions in the folding rates can be traced back to simple aspects of the overall 
structure or topology, rather than sequence-specific energetic aspects. 

Several experimental observations support the importance of protein topol- 
ogy and loop closure. First, insertion of small loops into turns of the protein 
structure slows down folding [15,17-19]. Second, inserting covalent crosslinks 
into the protein chain speeds up the folding process [19-23]. The crosslinks 
interconnect the chain and increase the localness of some of the contacts in the 
protein structure. Third, single-residue mutations that locally perturb ener- 
getic interactions typically have a 'less than tenfold effect' [13] on the folding 
rate, which appears small compared to the variations in folding rate observed 
for two-state proteins. For few single-residue mutants, larger changes in the 
folding rate have been observed [7,24]. Also, homologous proteins of the same 
size, which have the same structure but can differ considerably in sequence, 
have folding rates that differ typically by less than one or two orders of mag- 
nitude [2,25,26], which appears, again, small compared to the six orders of 
magnitude observed for two-state proteins. 

Can we predict folding routes from loop-closure principles? The CO or se- 
quence separation of a contact is the length of the loop that has to be closed 
to form the contact, provided that no other contacts have been formed prior 
that 'short-circuit' the chain. In other words, the CO measures loop lengths 
for the fully unfolded state of the protein chain. But during folding, other 
contacts may have been formed prior to a specific contact between residues i 
and j. The actual length of the loop that has be closed to form this contact 
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in the partially folded state of the protein chain can be estimated via the 
graph-theoretical concept of effective contact order (ECO) [27,28]. The ECO 
is the length of the shortest path between the two residues i and j that are 
brought in contact, see Fig. 1. The steps on this path either are bonds between 
neighboring residues in the chain, or contacts between residues that have been 
formed prior, such as contact C\ between the residues k and / in Fig. 1. In 
contrast to COs, the ECOs thus are route-dependent: they depend on the se- 
quence in which contacts are formed. On the minimum-ECO routes discussed 
in section 4, proteins fold, or 'zip up', in sequences of events that involve 
only closures of small loops, which minimizes the entropic loop-closure barri- 
ers during folding. The minimum-ECO routes help to understand the shape 
of <3>-value distributions from mutational analyses of the folding kinetics, see 
section 5. 



2 Contact maps, contact clusters, and topology 

To capture the concept of native-state topology more precisely, it is helpful 
to consider native contact maps. Contact maps are two-dimensional repre- 
sentations of three-dimensional protein structures. The native contact map of 
a protein is a matrix in which element indicates whether the residues 

% and j are in contact in the native structure. To some extent, the native 
contact map depends on the contact definition. In the map of Fig. 3(a), two 
residues are defined to be in contact if the distance between their backbone 
C a atoms is smaller than the cutoff distance 7 A chosen here, and in Fig. 3(b), 
if the distance between any of their non-hydrogen atoms is smaller than the 
cutoff distance 4.5 A. In the C Q contact map of Fig. 3(a), the contacts are 
arranged in clusters that correspond to the characteristic structural elements 
of CI2. These clusters are also present in the non-hydrogen-atom contact map 
of Fig. 3(b). In addition, the non-hydrogen-atom contact map contains more 
'isolated' contacts that mostly correspond to interactions of large sidechains, 
which are not represented in the backbone-centric C a contact map. A third 
type of contact map is shown in Fig. 3(c). The different gray tones in this map 
indicate the numbers of contacting non-hydrogen atom pairs of two residues. 
This contact map is the basis for the calculation of the relative CO. 

The contact maps of Fig. 3 indicate the chain positions % and j of contacting 
amino acids, but not which of the twenty different types of amino acids are 
located at these positions. In other words, the contact maps do not contain 
sequence information, they just contain information on the structure. This 
structural information is rather detailed. Single-residue mutations can lead to 
deletion or addition of contacts, and homologous proteins of the same size 
can differ in many native contacts. Nonetheless, single-residue mutants and 
homologous proteins have the same overall structure. To capture the overall 
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structure or 'structural topology' of a protein, it is helpful to take a more 
coarse-grained view of contact maps and to focus on contact clusters, e.g. on 
the clusters in the C a contact map of Fig. 3(a). The size of contact clusters 
may vary between wildtype and mutants of a protein, or between homologous 
proteins of similar size. But the overall location of these clusters in the contact 
map in general stays the same. The contact clusters thus capture the overall 
structural topology of a protein. 



3 Contact order, topological measures, and folding rates 

Simple measures of native-state topology are characteristic, average properties 
of contact maps. The relative CO defined by Plaxco et al. [12] is the average 
CO of all contacts between non-hydrogen atoms of the contact map shown 
in Fig. 3(c), divided by the chain length N. The CO of the contacting atoms 
simply is the sequence separation \i — j\ of the two residues % 7^ j in which 
the atoms are located. Depending on the data set, the obtained correlations 
between relative CO and the folding rates of two-state proteins vary between 
0.75 and 0.92, see Table 1. Proteins with many local contacts between residues 
that are close in the chain sequence have a small relative CO and fold faster 
than proteins with many nonlocal contacts and large relative CO. The typi- 
cally fast-folding a-helical proteins have small relative COs since their contact 
maps contain many local, intra-helical contacts between residues i and i + 3 
or % + 4. Proteins with /5-sheets, in contrast, have larger relative COs and, 
on average, fold slower. But also within the classes of a-helical and /9-sheet 
containing proteins, significant correlations between folding rates and relative 
CO can be observed [29]. 

Related topological measures that correlate with the folding rates of two-state 
proteins are the 'long-range order' [30], the 'total contact distance' [31], and 
the number Q D of nonlocal contacts with CO > 12 [32,33] (see Table 1). The 
long-range order is the number of contacts with CO > 12, divided by the 
chain length, and the total contact distance is the sum over the COs of all 
contacts, divided by the chain length squared. The topomer-search model of 
Makarov et al. [32,33] predicts that the number Q D of nonlocal contacts is 
proportional to \ogkf/Q D where kf is the folding rate [32,33]. The diffusive 
search for a topomer [32-35], i.e. for the "set of unfolded conformations that 
share a common, global topology with the native state" [33], has been sug- 
gested as a physical principle that underlies the correlation between relative 
CO and folding rates [32,33,36]. Recent extensive simulations with an off- 
lattice model indicate, however, that an unbiased diffusive search process for a 
native topomer "would take an impossibly long average time to complete" [37]. 

Can topological measures capture the increase in folding rate that is caused 
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by the insertion of covalent chain crosslinks [19-22]? Inserting crosslinks such 
as disulfide bonds into the protein chain increases the localness of some of 
the native contacts, since the crosslinks 'short-circuit' the chain. The natural 
extension of the CO of a contact in a crosslinked chain is the minimum number 
of covalently connected residues between the two residues in contact. This 
minimum number is the ECO of the contact in the crosslinked but otherwise 
unfolded chain, and the relative ECO is the natural extension of the relative 
CO for a crosslinked chain. The relative ECO appears to overestimate changes 
in the folding rates caused by crosslinks [38]. But a closely related pair of 
measures, the relative logCO and relative logECO, captures the folding rates 
of two-state proteins with and without crosslinks [38]. The relative logCO is 
the average value for the logarithm of the CO of all contacts, divided by the 
logarithm of the chain length, and the relative logECO is the natural extension 
of this measure for crosslinked chains. 



Do topological measures also predict folding rates of non-two-state proteins? 
Non-two-state proteins exhibit at least one metastable, experimentally de- 
tectable intermediate state during folding. The correlations between relative 
CO and folding rates of protein sets that contain both two-state and non- 
two-state proteins are insignificant (see Table 2). For these protein sets, the 
correlation coefficients between absolute CO and folding rates are much larger 
than the correlation coefficients for the relative CO [39], in contrast to two- 
state proteins (see Table 1 and 2). The absolute CO is the average CO of all 
native contacts, without the chain- length-dependent rescaling factor 1/N of 
the relative CO. Non-two-state proteins also exhibit strong correlations with 
the logarithm and the square root of the chain length N. 



There are also other simple models for protein folding rates. A conceptually 
somewhat different topological measure that correlates with protein folding 
rates is 'cliquishness' [40], which characterizes the overall clustering tendency 
of native contacts. In the past years, several groups have found that protein 
folding rates correlate with secondary structure content [41-43] or secondary 
propensity [44,45]. It has recently been suggested that secondary structure 
determines protein topology [46] . Native-state topology and loop closure thus 
may again be the principles that underlie the correlations between secondary 
structural measures and folding rates. The topology and chain-length depen- 
dence of folding rates has also been studied in lattice [47-51] and simple off- 
lattice models of proteins [52-55]. In lattice models, relatively strong correla- 
tions between folding rates and relative CO of model structures are observed 
if energetic terms that increase the folding cooperativity are included [47-49] . 
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4 Folding routes and effective contact order 

The correlations between protein folding rates and simple topological measures 
inspired the development of statistical-mechanical models based on native- 
state topology. These models can be grouped into three classes. First, there 
are models that use explicit representations of the protein chain with Go-type 
energy potentials [52,54-64], named after the Japanese physicist Nobuhiro 
Go [65]. In these potentials, amino acids that are in contact in the native 
structure attract each other, while amino acids not in contact in the native 
structure repel each other, irrespective, at least to some extent, of the physical 
interactions between the amino acids. The second class of models assumes 
that amino acids can be in either of two states: native-like structured, or 
unstructured [66-78]. Partially folded states then are described by sets of 
structured amino acids. These models are inspired by the Zimm-Bragg model 
for helix-coil transitions [79], which assumes that amino acids in helices can 
either be in a helix or a coil state. In the third class of models, partially 
folded states are characterized by the subset of native contacts formed in 
these states [80-84]. [] 

Folding routes can be predicted from native contact maps rather directly via 
the concept of effective contact order (ECO). The ECO is an estimate for the 
length of the loop that has to be closed to form a contact or contact cluster 
in a partially folded chain conformation (see Fig. 1). The contact clusters 
in a native contact map represent the characteristic structural elements of 
a protein. MD simulations indicate increased correlations between contacts 
of the same contact cluster [92]. In a coarse-grained view, individual folding 
routes can be described by the sequence in which contact clusters are formed. 
For the protein CI2, which has just four contact clusters, there are 4! = 24 
possible sequences in which the clusters can be formed. The length of the loop 
that has to be closed to form a contact cluster in general depends on the 
sequence in which the clusters are formed. For example, the contact cluster 
/3i/?4 in the contact map of Fig. 3(a) represents contacts between the two 
terminal strands (3\ and f3& of CI2. Forming this contact cluster from the 
fully unfolded state, i.e. prior to the other three clusters, requires to close a 
relatively large loop of length 42 (see Table 3). However, forming /3 1( 5 4 after the 
other three clusters a, P2P3, and /3 3 /3 4 requires only to close a relatively small 
loop of length 7. The reason is that the contacts of the clusters a, P2P3, and 
/?3/?4 short-circuit the chain, which brings the two chain ends with the strands 
Pi and /3 4 into closer spatial proximity. On the minimum-ECO route [80,83], 
i.e. the folding route that minimizes the loop lengths and, thus, the entropic 



Other approaches that do not directly fall in one these three classes are the 
diffusion-collision model of Karplus and Weaver [85-87] as well as free-energy- 
functional [88,89] and perturbed- Gaussian-chain methods [90,91]. 
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loop-closure barriers, the cluster /?i/?4 is formed after the clusters a, and 
/^Ai- On this route, a, P2P3, and form in parallel since the ECOs of these 
three clusters do not depend on the sequence in which they are formed (see 
Fig. 4). 

In general, there are two scenarios for two contact clusters (or structural ele- 
ments) A and B of a protein. In the first scenario, the ECOs (or loop lengths) 
of the contact clusters A and B do not depend on the sequence in which the 
clusters are formed. The two clusters then are predicted to form parallel to each 
other. In the second scenario, the ECO of one of the two clusters, e.g. cluster 
B, is significantly smaller if cluster A is formed prior to B. The clusters are 
then predicted to form sequentially, provided that the total loop-closure cost 
for cluster B along this route, which includes the loop-closure cost for cluster 
A, is smaller than on other routes [83]. An important point here is that the 
loop-closure dependencies between two contact clusters typically are strong in 
the second scenario, i.e. the differences in loop lengths are large if the sequence 
of events in which the clusters are formed is reversed. Therefore, simple es- 
timates of loop-closure entropies [80, 82] or minimization of loop lengths [83] 
are sufficient to derive the dominant minimum-ECO or minimum-entropy-loss 
routes. Detailed calculations show that the entropy loss for closing a loop in 
an unfolded chain is proportional to the logarithm of the loop length for large 
loops [16,93-96], while the closure of short loops with up to about 5 residues 
is impeded by the chain stiffness [95-98]. 



5 $-value distributions and topology 



Several experimental methods provide information on folding routes. The char- 
acterization of metastable, partially folded states of non-two-state proteins 
gives direct information on folding intermediates, provided these metastable 
states or 'on-route' to the native state, and not 'off-route' traps. Structural 
information on these intermediates can be obtained with hydrogen-exchange 
or NMR methods [99-105]. Two-state proteins do not exhibit experimentally 
detectable, metastable intermediates during folding. Instead, the folding ki- 
netics of many two-state proteins has been investigated via mutational anal- 
ysis [11,106-124]. In a mutational analysis, a large number of mostly single- 
residue mutants of a protein is generated. For each mutant, the effect of the 
mutation on the folding dynamics is quantified by its <3>-value [3, 125] 

_ RTln(k wt /k nmt ) 

~ AG N U 

Here, k wt is the folding rate for the wildtype protein, k mut is the folding rate 
for the mutant protein, and AGjy is the change of the protein stability induced 
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by the mutation. The stability Gat of a protein is the free energy difference 
between the denatured state D and the native state N. In a more recent 
method termed ^-value analysis, divalent biHis metal-ion binding sites are 
introduced into a protein, and the folding rate of the mutant protein is studied 
as a function of the metal-ion concentration [126-129]. 

$-values have been calculated in statistical-mechanical models that are based 
on native structures [57, 59, 61-63, 66-71, 91] and from Molecular Dynamics 
unfolding simulations at elevated temperatures [130-132]. The detailed mod- 
eling of values requires estimates for mutation-induced free energy changes 
[133, 134], which goes beyond simple topology-based modeling. However, on a 
more coarse-grained level, the level of average $- values for secondary struc- 
tural elements, important aspects of $-value distributions are captured by 
native-state topology. An average $-value close to zero for a secondary struc- 
tural element (i.e. a helix or a /3-strand) indicates that mutations in the sec- 
ondary element affect the folding rate only marginally, see eq. (1). In contrast, 
a large average $-value indicates that mutations have a strong impact on the 
folding rate. In a sense, the average ^-values thus capture the 'kinetic im- 
pact' of secondary elements. The kinetic impact can also be estimated from 
minimum-ECO routes. The minimum-ECO route of the src SH3 domain is 
shown in Fig. 5. Here, an arrow pointing from a contact cluster A to a cluster 
B in the contact map indicates that A is formed prior to B. On the minimum- 
ECO route, the contact cluster RT-/^ forms after fa fa and 0304, and the clus- 
ter fafa after RT, /3 2 /?3 and fa fa. The clusters RT-/9 4 and fa fa are nonlocal 
clusters and form in parallel on this route. The other three large contact clus- 
ters, the RT loop, an irregular, hairpin-like structure, and the two /3-hairpins 
fa fa and fa fa, are local clusters. Local clusters contain contacts with small 
CO and, thus, are located close to the diagonal of the contact map. 

The kinetic impact of a contact cluster can be estimated from how often the 
cluster appears along the minimum-ECO route to other clusters [83]. The 
kinetic impact here is a semi-quantitative concept and can attain the values 
high, medium, or low. The kinetic impact of fa fa and fa fa, for example, is high 
since these clusters appear on the route to both nonlocal clusters. The kinetic 
impact of RT is medium since the cluster only appears on the route to fa fa. 
The kinetic impact of fa fa and, thus, of the strands fa and fa is low since this 
cluster form last. The kinetic impact derived from the minimum-ECO route 
agrees with average $- values for the secondary elements (see Fig. 5). The $- 
value distribution of the src SH3 domain is polarized, i.e. the average ^-values 
are large for some of the secondary elements (the strands fa, fa, and fa), 
and small for others (the strands fa and fa). The agreement between average 
$-values and estimated kinetic impact shows that the polarized shape of the 
$-value distribution can be understood from simple topology-based modeling. 

The <3>-value distributions of two-state proteins are either polarized or diffuse. 
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In a diffuse distribution, the average $-values for the secondary elements are 
of similar magnitude and differ not more than, say, a factor 2 from each other. 
A diffuse distribution of kinetic impact occurs, e.g., if all clusters are involved 
on an 'equal footing' in the formation of a single rate-limiting cluster. On the 
minimum-ECO route of CI2, for example, (3i(3± forms after the other three 
clusters, which results in a diffuse distribution of kinetic impact, in agreement 
with the experimental $- value distribution [83]. A polarized distribution, in 
contrast, occurs if some clusters have a central role on the minimum-ECO 
route, such as the clusters fofe and /3 3 /?4 of the src SH3 domain. Also for other 
two-state proteins, the diffuse of polarized shape of their $- value distributions 
can be traced back to minimum-ECO routes and native-state topology [83]. 

Minimum-ECO routes also help to understand why 'topological mutations' 
such as circular permutations can have a drastic impact on the $- value distri- 
bution [80] . In a circular permutation, the chain ends of a protein are covalently 
connected, and the chain is 'opened' at a different location [20,135-140]. The 
protein still folds into the same structure [135], but the 'rewiring' of the protein 
chain changes the loop-closure connections between the structural elements, 
and hence the minimum-ECO routes. The effect of circular permutations on 
folding routes and <3>-value distributions has also been studied with protein 
models that use explicit chain representations and Go-type potentials [58,141]. 

The prediction of minimum-ECO routes is purely topology-based, i.e. it is 
based on native contact clusters and the entropic loop-closure dependencies 
between these clusters. Sequence-dependent, energetic effects may play a role 
if parallel processes on minimum-ECO routes have a similar loop-closure cost. 
The sequence-dependent energies then can lead to a situation where one of 
these processes dominates the folding kinetics [83]. 



6 Summary and outlook 

The kinetic protein folding principle considered in this article is loop closure. 
Evidence for such a principle comes from strong correlations between protein 
folding rates and topological measures, and from experiments in which the 
effect of loop insertions [15, 17-19] or crosslinks [19-23] on the kinetics is 
studied. Simple 'zipping' or minimum-ECO models that are based on the loop- 
closure relations between structural elements help to understand the shape of 
<3>- value distributions from mutational analysis of the folding kinetics [83] . 

We have focused here an kinetic models and measures that are based on native- 
state topology. In the past years, the microsecond folding of ultrafast proteins 
has also been studied in Molecular Dynamics simulations with physical, atom- 
istic force fields [142-145] . An ultimate goal is to model both native structures 
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and folding kinetics with such physical, sequence-based force fields. Even if this 
goal is achieved, the question remains whether there are simple principles that 
govern protein folding, such as the loop-closure principle considered here. In 
the next decade(s), detailed atomistic folding trajectories of two-state proteins 
may help to assess and establish such principles. 
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Table 1: Correlation coefficients |r| between folding rates of two-state proteins and 
simple topological measures 



Authors 


Ref. 


size of 
protein set 


relCO 


absCO 


LRO 


TCD 


Qd 


logCO 


Plaxco et al. 


[146], [33] 


24 


0.92 








0.88 




Gromiha & Selvaraj 


[30] 


23 


0.79^ 




0.78 








Zhou & Zhou 


[31] 


28 


(0.74) W 




0.81 


0.88 






Micheletti 


[40] 


29 


0.75 


0.70 










Ivankov et al. 


[39] 


30 


0.75 


0.51 










Kamagata et al. 


[6] 


18 


0.84 


0.78 






0.88 




Dixit & Weikl 


[38] 


26 


0.92 


0.69 


0.84( c ) 


0.90^ 


0.82^ 


0.90 



Absolute values |r| of the Pearson coefficient for the correlations between folding 
rates of several sets of two-state proteins and relative contact order (relCO) [12], ab- 
solute contact order (absCO) [4], long-range order (LRO) [30], total contact distance 
(TCR) [31], and logCO [38]. In case of the number of nonlocal contacts Qd [32], 
the given coefficients report the correlations between Qd and log kf/Qo where kf 
are the folding rates. 

( a ) calculated from table 1 of Ref. [30] 

( 6 ) the value is given in brackets since a slightly different definition for the relative 
CO has been used 

( c ) the values have been calculated for the protein structures given in Ref. [38] . The 
protein set is the set of Grantcharova et al. [4] , which extends the set of Plaxco et 
al. [146] by two proteins. 
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Table 2: Correlation coefficients \r\ for folding rates of sets of two-state and non- 
two-state proteins 



Authors 


Ref. 


size of 
protein set 


relCO 


absCO 


ln(iV) 


ATl/2 


Ivankov et al., Li et al. 


[39], [147] 


57 


0.1 


0.74 


0.72 


0.71 


Naganathan & Muhoz 


[148] 


69 








0.74 


Kamagata et al. 


[6] 


40 


0.09 


0.72 


0.68 


0.67 



Absolute values |r| of the Pearson coefficient for the correlations between folding 
rates of sets of two-state and non-two-state proteins and relative contact order 
(relCO), absolute contact order (absCO), and the logarithm and square root of the 
chain length N. The set of Naganathan and Munoz [148] extends the set of Ivankov 
et al. [39] by 12 proteins. The correlation coefficients for the set of Kamagata et al. 
have been calculated from the data given in Table 1 and 2 of Ref. [6]. 



Table 3: Loop lengths (ECOs) for forming the strand pairing /3i/?4 of CI2 



structural elements 
formed prior 


minimum ECO 
for 


a + f3 2 (3 3 + /3 3 /?4 


7 


a + 


12 


fafo + /%& 


19 


a + /?3/34 


24 




23 


a 


31 


/3 3 /?4 


36 




42 



The given ECOs are the minimum ECOs among all contacts of the cluster /3i/?4. 
The contact clusters are defined as in Ref. [83]. 
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Fig. 1. Loop lengths in partially folded conformations of a protein chain can be 
estimated via the graph-theoretical concept of effective contact order (ECO) [27,28]. 
The ECO of the contact C2 is the length of the shortest path between the two 
residues i and j forming the contact. The 'steps' along this shortest path either are 
covalent bonds between adjacent residues, or noncovalent contacts formed previously 
in the folding process such as the contact C\. In this example, the ECO for the 
contact C2 is 5, since the shortest path (shown in red) involves two steps from i to 
k, one step for the contact C\ between k and I, and two steps from I to j. The contact 
order (CO), in contrast, is the sequence separation \i — j\ between the two residues, 
the number of residues along the blue path between i and j. In this example, the 
CO of the contact C2 is 10. 
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Fig. 2. The structure of the protein CI2 consists of an a-helix packed against a 
four-stranded /3-sheet [149]. 
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Fig. 3. Native contact maps of the 
protein CI2 shown in Fig. 2, for dif- 
ferent contact definitions: (a) A black 
dot at position indicates that 

the C a atoms of the residues i and 
j are within the cutoff distance 7 
A. The residue numbers are speci- 
fied along the two axes of the con- 
tact map. The four large clusters of 
contacts represent the structural ele- 
ments of CI2, i.e. the a-helix and the 
three /3-strand pairings P2P31 ^3(^4, 
and - (b) Black dots in- 

dicate that at least two non-hydro- 
gen atoms of the residues i and j 
are within cutoff distance 4.5 A. As 
above, contacts of neighboring 

or next-nearest neighboring residues 
with \i — j\ < 2 (gray dots along 
the diagonal) are not taken into ac- 
count. - (c) The gray scale of the dots 
indicates the numbers of non-hydro- 
gen-atom pairs of two residues i and 
j that are within the cutoff distance 
6 A. Black dots represent residue 
pairs for which more than 40 differ- 
ent non-hydrogen-atom pairs are in 
contact, lighter gray colors represent 
residue pairs with fewer non-hydro- 
gen-atom contacts. 
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Fig. 4. Minimum-EC O, or minimum-entropy-loss route of the protein CI2. Along 
this route, the strand pairing f3\(5^ is formed after the other three structural ele- 
ments, the a-helix and the strand pairings /?2/?3 and (3^(5^. The route minimizes the 
length of the loop that has to be closed to bring the two terminal strands (3\ and 
/?4 into contact (see Table 3). 
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Fig. 5. (Top) Minimum-ECO route of the src SH3 domain. The arrows indicate 
the sequences of events along this route. The red arrow pointing from the contact 
cluster RT to the cluster 0i0&, for example, indicates that the RT loop is formed 
prior to the strand pairing 0i0$. ~~ (Bottom) Average experimental ^-values [111] 
for the /3-strands and the RT loop (grey bars) and kinetic impact estimated from 
the minimum-ECO route (black bars). The kinetic impact of the strands 02, 03, 
and 04 is high since the clusters 0203 and 0304 are formed prior to both nonlocal 
clusters RT-/?4 and 0i0§ [83]. The kinetic impact of RT is medium since the cluster 
is formed prior to only one of the nonlocal clusters, 0i0$ ■ The kinetic impact of 0\ 
and 05 is low since the cluster 0±0s forms last, parallel to RT-04. 
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